{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# KDNuggets Transfer Learning Blog Post\n",
    "\n",
    "This is the companion notebook to the MLDB.ai guest blog post on KGNuggets.\n",
    "\n",
    "The post will soon be published. If you want to try an interactive version of this notebook, simply [signup for a free account](https://mldb.ai/#signup).\n",
    "\n",
    "### Import some libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from pymldb import Connection\n",
    "mldb = Connection()\n",
    "\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Inception on MLDB\n",
    "\n",
    "We start by creating the `inception` function that we can call to run the trained Inception-V3 TensorFlow model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<Response [201]>\n"
     ]
    }
   ],
   "source": [
    "print mldb.put('/v1/functions/inception', {\n",
    "    \"type\": 'tensorflow.graph',\n",
    "    \"params\": {\n",
    "        \"modelFileUrl\": 'archive+'+\n",
    "            'http://public.mldb.ai/models/inception_dec_2015.zip'+\n",
    "            '#tensorflow_inception_graph.pb',\n",
    "        \"inputs\": 'fetcher(url)[content] AS \"DecodeJpeg/contents\"',\n",
    "        \"outputs\": \"pool_3\"\n",
    "    }\n",
    "})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "We can then use it to embed any image in the representation it learned. Let's try doing this to the KDNuggets logo:\n",
    "\n",
    "<img src=\"http://www.skytree.net/wp-content/uploads/2014/08/KDnuggets.jpg\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>pool_3.0.0.0.0</th>\n",
       "      <th>pool_3.0.0.0.1</th>\n",
       "      <th>pool_3.0.0.0.2</th>\n",
       "      <th>pool_3.0.0.0.3</th>\n",
       "      <th>pool_3.0.0.0.4</th>\n",
       "      <th>pool_3.0.0.0.5</th>\n",
       "      <th>pool_3.0.0.0.6</th>\n",
       "      <th>pool_3.0.0.0.7</th>\n",
       "      <th>pool_3.0.0.0.8</th>\n",
       "      <th>pool_3.0.0.0.9</th>\n",
       "      <th>...</th>\n",
       "      <th>pool_3.0.0.0.2038</th>\n",
       "      <th>pool_3.0.0.0.2039</th>\n",
       "      <th>pool_3.0.0.0.2040</th>\n",
       "      <th>pool_3.0.0.0.2041</th>\n",
       "      <th>pool_3.0.0.0.2042</th>\n",
       "      <th>pool_3.0.0.0.2043</th>\n",
       "      <th>pool_3.0.0.0.2044</th>\n",
       "      <th>pool_3.0.0.0.2045</th>\n",
       "      <th>pool_3.0.0.0.2046</th>\n",
       "      <th>pool_3.0.0.0.2047</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>_rowName</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>result</th>\n",
       "      <td>0.405393</td>\n",
       "      <td>0.073578</td>\n",
       "      <td>0.063868</td>\n",
       "      <td>0.133508</td>\n",
       "      <td>0.044338</td>\n",
       "      <td>0.002757</td>\n",
       "      <td>0.579667</td>\n",
       "      <td>0.012046</td>\n",
       "      <td>0.74275</td>\n",
       "      <td>0.862201</td>\n",
       "      <td>...</td>\n",
       "      <td>0.570614</td>\n",
       "      <td>0.245445</td>\n",
       "      <td>0.192202</td>\n",
       "      <td>0.772916</td>\n",
       "      <td>0.002887</td>\n",
       "      <td>0.424597</td>\n",
       "      <td>0.018911</td>\n",
       "      <td>0.035651</td>\n",
       "      <td>0.114374</td>\n",
       "      <td>1.145283</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1 rows × 2048 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "          pool_3.0.0.0.0  pool_3.0.0.0.1  pool_3.0.0.0.2  pool_3.0.0.0.3  \\\n",
       "_rowName                                                                   \n",
       "result          0.405393        0.073578        0.063868        0.133508   \n",
       "\n",
       "          pool_3.0.0.0.4  pool_3.0.0.0.5  pool_3.0.0.0.6  pool_3.0.0.0.7  \\\n",
       "_rowName                                                                   \n",
       "result          0.044338        0.002757        0.579667        0.012046   \n",
       "\n",
       "          pool_3.0.0.0.8  pool_3.0.0.0.9        ...          \\\n",
       "_rowName                                        ...           \n",
       "result           0.74275        0.862201        ...           \n",
       "\n",
       "          pool_3.0.0.0.2038  pool_3.0.0.0.2039  pool_3.0.0.0.2040  \\\n",
       "_rowName                                                            \n",
       "result             0.570614           0.245445           0.192202   \n",
       "\n",
       "          pool_3.0.0.0.2041  pool_3.0.0.0.2042  pool_3.0.0.0.2043  \\\n",
       "_rowName                                                            \n",
       "result             0.772916           0.002887           0.424597   \n",
       "\n",
       "          pool_3.0.0.0.2044  pool_3.0.0.0.2045  pool_3.0.0.0.2046  \\\n",
       "_rowName                                                            \n",
       "result             0.018911           0.035651           0.114374   \n",
       "\n",
       "          pool_3.0.0.0.2047  \n",
       "_rowName                     \n",
       "result             1.145283  \n",
       "\n",
       "[1 rows x 2048 columns]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "kdNuggets = \"http://www.skytree.net/wp-content/uploads/2014/08/KDnuggets.jpg\"\n",
    "\n",
    "mldb.query(\"SELECT inception({url: '%s'}) as *\" % kdNuggets)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Loading our dataset\n",
    "\n",
    "We now load a CSV dataset that contains links to car images from three brands:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<Response [201]>\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>brand</th>\n",
       "      <th>url</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>_rowName</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>audi</td>\n",
       "      <td>https://s3.amazonaws.com/public-mldb-ai/datase...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>audi</td>\n",
       "      <td>https://s3.amazonaws.com/public-mldb-ai/datase...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>audi</td>\n",
       "      <td>https://s3.amazonaws.com/public-mldb-ai/datase...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         brand                                                url\n",
       "_rowName                                                         \n",
       "2         audi  https://s3.amazonaws.com/public-mldb-ai/datase...\n",
       "3         audi  https://s3.amazonaws.com/public-mldb-ai/datase...\n",
       "4         audi  https://s3.amazonaws.com/public-mldb-ai/datase..."
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print mldb.post(\"/v1/procedures\", {\n",
    "    \"type\": \"import.text\",\n",
    "    \"params\": {\n",
    "        \"dataFileUrl\": \"https://public.mldb.ai/datasets/car_brand_images/cars_urls.csv\",\n",
    "        \"outputDataset\": \"images\"\n",
    "    }\n",
    "})\n",
    "\n",
    "mldb.query(\"SELECT * FROM images LIMIT 3\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's get a sense of how many images we have in each class:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count(*)</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>_rowName</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>\"[\"\"audi\"\"]\"</th>\n",
       "      <td>69</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>\"[\"\"bmw\"\"]\"</th>\n",
       "      <td>72</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>\"[\"\"tesla\"\"]\"</th>\n",
       "      <td>72</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               count(*)\n",
       "_rowName               \n",
       "\"[\"\"audi\"\"]\"         69\n",
       "\"[\"\"bmw\"\"]\"          72\n",
       "\"[\"\"tesla\"\"]\"        72"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mldb.query(\"SELECT count(*) FROM images GROUP BY brand\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "We can easily run a few images through the network like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>pool_3.0.0.0.0</th>\n",
       "      <th>pool_3.0.0.0.1</th>\n",
       "      <th>pool_3.0.0.0.2</th>\n",
       "      <th>pool_3.0.0.0.3</th>\n",
       "      <th>pool_3.0.0.0.4</th>\n",
       "      <th>pool_3.0.0.0.5</th>\n",
       "      <th>pool_3.0.0.0.6</th>\n",
       "      <th>pool_3.0.0.0.7</th>\n",
       "      <th>pool_3.0.0.0.8</th>\n",
       "      <th>pool_3.0.0.0.9</th>\n",
       "      <th>...</th>\n",
       "      <th>pool_3.0.0.0.2038</th>\n",
       "      <th>pool_3.0.0.0.2039</th>\n",
       "      <th>pool_3.0.0.0.2040</th>\n",
       "      <th>pool_3.0.0.0.2041</th>\n",
       "      <th>pool_3.0.0.0.2042</th>\n",
       "      <th>pool_3.0.0.0.2043</th>\n",
       "      <th>pool_3.0.0.0.2044</th>\n",
       "      <th>pool_3.0.0.0.2045</th>\n",
       "      <th>pool_3.0.0.0.2046</th>\n",
       "      <th>pool_3.0.0.0.2047</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>_rowName</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.132836</td>\n",
       "      <td>0.267571</td>\n",
       "      <td>0.460861</td>\n",
       "      <td>0.073345</td>\n",
       "      <td>0.319553</td>\n",
       "      <td>0.245528</td>\n",
       "      <td>0.222645</td>\n",
       "      <td>0.076144</td>\n",
       "      <td>0.266402</td>\n",
       "      <td>0.269410</td>\n",
       "      <td>...</td>\n",
       "      <td>0.517730</td>\n",
       "      <td>0.324239</td>\n",
       "      <td>0.182698</td>\n",
       "      <td>0.589910</td>\n",
       "      <td>0.296351</td>\n",
       "      <td>0.356238</td>\n",
       "      <td>0.185801</td>\n",
       "      <td>0.236495</td>\n",
       "      <td>0.853976</td>\n",
       "      <td>0.004721</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.371351</td>\n",
       "      <td>0.253051</td>\n",
       "      <td>0.222188</td>\n",
       "      <td>0.131357</td>\n",
       "      <td>0.352972</td>\n",
       "      <td>0.163278</td>\n",
       "      <td>0.205562</td>\n",
       "      <td>0.148676</td>\n",
       "      <td>1.125986</td>\n",
       "      <td>0.027997</td>\n",
       "      <td>...</td>\n",
       "      <td>0.005888</td>\n",
       "      <td>0.139715</td>\n",
       "      <td>0.103083</td>\n",
       "      <td>0.718988</td>\n",
       "      <td>0.326615</td>\n",
       "      <td>0.118558</td>\n",
       "      <td>0.087323</td>\n",
       "      <td>0.117636</td>\n",
       "      <td>0.382889</td>\n",
       "      <td>0.024768</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.462379</td>\n",
       "      <td>0.433509</td>\n",
       "      <td>0.411765</td>\n",
       "      <td>0.157537</td>\n",
       "      <td>0.277709</td>\n",
       "      <td>0.221781</td>\n",
       "      <td>0.190680</td>\n",
       "      <td>0.045631</td>\n",
       "      <td>0.885003</td>\n",
       "      <td>0.105056</td>\n",
       "      <td>...</td>\n",
       "      <td>0.046998</td>\n",
       "      <td>0.388228</td>\n",
       "      <td>0.573246</td>\n",
       "      <td>0.877034</td>\n",
       "      <td>0.491815</td>\n",
       "      <td>0.191424</td>\n",
       "      <td>0.259622</td>\n",
       "      <td>0.402484</td>\n",
       "      <td>0.741471</td>\n",
       "      <td>0.015490</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3 rows × 2048 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "          pool_3.0.0.0.0  pool_3.0.0.0.1  pool_3.0.0.0.2  pool_3.0.0.0.3  \\\n",
       "_rowName                                                                   \n",
       "2               0.132836        0.267571        0.460861        0.073345   \n",
       "3               0.371351        0.253051        0.222188        0.131357   \n",
       "4               0.462379        0.433509        0.411765        0.157537   \n",
       "\n",
       "          pool_3.0.0.0.4  pool_3.0.0.0.5  pool_3.0.0.0.6  pool_3.0.0.0.7  \\\n",
       "_rowName                                                                   \n",
       "2               0.319553        0.245528        0.222645        0.076144   \n",
       "3               0.352972        0.163278        0.205562        0.148676   \n",
       "4               0.277709        0.221781        0.190680        0.045631   \n",
       "\n",
       "          pool_3.0.0.0.8  pool_3.0.0.0.9        ...          \\\n",
       "_rowName                                        ...           \n",
       "2               0.266402        0.269410        ...           \n",
       "3               1.125986        0.027997        ...           \n",
       "4               0.885003        0.105056        ...           \n",
       "\n",
       "          pool_3.0.0.0.2038  pool_3.0.0.0.2039  pool_3.0.0.0.2040  \\\n",
       "_rowName                                                            \n",
       "2                  0.517730           0.324239           0.182698   \n",
       "3                  0.005888           0.139715           0.103083   \n",
       "4                  0.046998           0.388228           0.573246   \n",
       "\n",
       "          pool_3.0.0.0.2041  pool_3.0.0.0.2042  pool_3.0.0.0.2043  \\\n",
       "_rowName                                                            \n",
       "2                  0.589910           0.296351           0.356238   \n",
       "3                  0.718988           0.326615           0.118558   \n",
       "4                  0.877034           0.491815           0.191424   \n",
       "\n",
       "          pool_3.0.0.0.2044  pool_3.0.0.0.2045  pool_3.0.0.0.2046  \\\n",
       "_rowName                                                            \n",
       "2                  0.185801           0.236495           0.853976   \n",
       "3                  0.087323           0.117636           0.382889   \n",
       "4                  0.259622           0.402484           0.741471   \n",
       "\n",
       "          pool_3.0.0.0.2047  \n",
       "_rowName                     \n",
       "2                  0.004721  \n",
       "3                  0.024768  \n",
       "4                  0.015490  \n",
       "\n",
       "[3 rows x 2048 columns]"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mldb.query(\"SELECT inception({url: url}) AS * FROM images LIMIT 3\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To create our training dataset, we run a transform procedure to apply the TensorFlow model to all images:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<Response [201]>\n"
     ]
    }
   ],
   "source": [
    "print mldb.post(\"/v1/procedures\", {\n",
    "    \"type\": \"transform\",\n",
    "    \"params\": {\n",
    "        \"inputData\": \"\"\"\n",
    "            SELECT brand,\n",
    "                   inception({url}) as *\n",
    "            FROM images\n",
    "        \"\"\",\n",
    "        \"outputDataset\": \"training_dataset\"\n",
    "    }\n",
    "})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This gives us the following result:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>brand</th>\n",
       "      <th>pool_3.0.0.0.0</th>\n",
       "      <th>pool_3.0.0.0.1</th>\n",
       "      <th>pool_3.0.0.0.2</th>\n",
       "      <th>pool_3.0.0.0.3</th>\n",
       "      <th>pool_3.0.0.0.4</th>\n",
       "      <th>pool_3.0.0.0.5</th>\n",
       "      <th>pool_3.0.0.0.6</th>\n",
       "      <th>pool_3.0.0.0.7</th>\n",
       "      <th>pool_3.0.0.0.8</th>\n",
       "      <th>...</th>\n",
       "      <th>pool_3.0.0.0.2038</th>\n",
       "      <th>pool_3.0.0.0.2039</th>\n",
       "      <th>pool_3.0.0.0.2040</th>\n",
       "      <th>pool_3.0.0.0.2041</th>\n",
       "      <th>pool_3.0.0.0.2042</th>\n",
       "      <th>pool_3.0.0.0.2043</th>\n",
       "      <th>pool_3.0.0.0.2044</th>\n",
       "      <th>pool_3.0.0.0.2045</th>\n",
       "      <th>pool_3.0.0.0.2046</th>\n",
       "      <th>pool_3.0.0.0.2047</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>_rowName</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>127</th>\n",
       "      <td>bmw</td>\n",
       "      <td>0.273366</td>\n",
       "      <td>0.488088</td>\n",
       "      <td>0.243253</td>\n",
       "      <td>0.293343</td>\n",
       "      <td>0.172615</td>\n",
       "      <td>0.168309</td>\n",
       "      <td>0.421383</td>\n",
       "      <td>0.327155</td>\n",
       "      <td>1.005386</td>\n",
       "      <td>...</td>\n",
       "      <td>0.018846</td>\n",
       "      <td>0.089321</td>\n",
       "      <td>0.341528</td>\n",
       "      <td>1.002636</td>\n",
       "      <td>0.483638</td>\n",
       "      <td>0.030398</td>\n",
       "      <td>0.017882</td>\n",
       "      <td>0.212639</td>\n",
       "      <td>0.282181</td>\n",
       "      <td>0.225709</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>108</th>\n",
       "      <td>bmw</td>\n",
       "      <td>0.433248</td>\n",
       "      <td>0.732170</td>\n",
       "      <td>0.282846</td>\n",
       "      <td>0.142594</td>\n",
       "      <td>0.142779</td>\n",
       "      <td>0.172190</td>\n",
       "      <td>0.231635</td>\n",
       "      <td>0.026957</td>\n",
       "      <td>0.271389</td>\n",
       "      <td>...</td>\n",
       "      <td>0.451374</td>\n",
       "      <td>0.189907</td>\n",
       "      <td>0.286528</td>\n",
       "      <td>0.394389</td>\n",
       "      <td>0.602661</td>\n",
       "      <td>0.053008</td>\n",
       "      <td>0.575223</td>\n",
       "      <td>0.344477</td>\n",
       "      <td>0.370394</td>\n",
       "      <td>0.031122</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>95</th>\n",
       "      <td>bmw</td>\n",
       "      <td>0.068673</td>\n",
       "      <td>0.441297</td>\n",
       "      <td>0.342519</td>\n",
       "      <td>0.020982</td>\n",
       "      <td>0.224962</td>\n",
       "      <td>0.301158</td>\n",
       "      <td>0.285375</td>\n",
       "      <td>0.053405</td>\n",
       "      <td>0.834576</td>\n",
       "      <td>...</td>\n",
       "      <td>0.497007</td>\n",
       "      <td>0.361395</td>\n",
       "      <td>0.186176</td>\n",
       "      <td>0.748922</td>\n",
       "      <td>0.359960</td>\n",
       "      <td>0.053431</td>\n",
       "      <td>0.146289</td>\n",
       "      <td>0.509478</td>\n",
       "      <td>0.515487</td>\n",
       "      <td>0.044757</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3 rows × 2049 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "         brand  pool_3.0.0.0.0  pool_3.0.0.0.1  pool_3.0.0.0.2  \\\n",
       "_rowName                                                         \n",
       "127        bmw        0.273366        0.488088        0.243253   \n",
       "108        bmw        0.433248        0.732170        0.282846   \n",
       "95         bmw        0.068673        0.441297        0.342519   \n",
       "\n",
       "          pool_3.0.0.0.3  pool_3.0.0.0.4  pool_3.0.0.0.5  pool_3.0.0.0.6  \\\n",
       "_rowName                                                                   \n",
       "127             0.293343        0.172615        0.168309        0.421383   \n",
       "108             0.142594        0.142779        0.172190        0.231635   \n",
       "95              0.020982        0.224962        0.301158        0.285375   \n",
       "\n",
       "          pool_3.0.0.0.7  pool_3.0.0.0.8        ...          \\\n",
       "_rowName                                        ...           \n",
       "127             0.327155        1.005386        ...           \n",
       "108             0.026957        0.271389        ...           \n",
       "95              0.053405        0.834576        ...           \n",
       "\n",
       "          pool_3.0.0.0.2038  pool_3.0.0.0.2039  pool_3.0.0.0.2040  \\\n",
       "_rowName                                                            \n",
       "127                0.018846           0.089321           0.341528   \n",
       "108                0.451374           0.189907           0.286528   \n",
       "95                 0.497007           0.361395           0.186176   \n",
       "\n",
       "          pool_3.0.0.0.2041  pool_3.0.0.0.2042  pool_3.0.0.0.2043  \\\n",
       "_rowName                                                            \n",
       "127                1.002636           0.483638           0.030398   \n",
       "108                0.394389           0.602661           0.053008   \n",
       "95                 0.748922           0.359960           0.053431   \n",
       "\n",
       "          pool_3.0.0.0.2044  pool_3.0.0.0.2045  pool_3.0.0.0.2046  \\\n",
       "_rowName                                                            \n",
       "127                0.017882           0.212639           0.282181   \n",
       "108                0.575223           0.344477           0.370394   \n",
       "95                 0.146289           0.509478           0.515487   \n",
       "\n",
       "          pool_3.0.0.0.2047  \n",
       "_rowName                     \n",
       "127                0.225709  \n",
       "108                0.031122  \n",
       "95                 0.044757  \n",
       "\n",
       "[3 rows x 2049 columns]"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mldb.query(\"SELECT * FROM training_dataset LIMIT 3\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Training a classifier\n",
    "\n",
    "Let's now train a model. We'll use a 50/50 split for training and testing, and use a random forest:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<Response [201]>\n"
     ]
    }
   ],
   "source": [
    "rez = mldb.post(\"/v1/procedures\", {\n",
    "    \"type\": \"classifier.experiment\",\n",
    "    \"params\": {\n",
    "        \"experimentName\": \"car_brand_cls\",\n",
    "        \"inputData\": \"\"\"        \n",
    "            SELECT \n",
    "                {* EXCLUDING(brand)} as features,\n",
    "                brand as label\n",
    "            FROM training_dataset\n",
    "        \"\"\",\n",
    "        \"mode\": \"categorical\",\n",
    "        \"modelFileUrlPattern\": \"file:///mldb_data/car_brand_cls.cls\",\n",
    "         \"configuration\": {\n",
    "            \"type\": \"bagging\",\n",
    "            \"weak_learner\": {\n",
    "                \"type\": \"boosting\",\n",
    "                \"weak_learner\": {\n",
    "                    \"type\": \"decision_tree\",\n",
    "                    \"max_depth\": 5,\n",
    "                    \"update_alg\": \"gentle\",\n",
    "                    \"random_feature_propn\": 0.6\n",
    "                },\n",
    "                \"min_iter\": 5,\n",
    "                \"max_iter\": 30\n",
    "            },\n",
    "            \"num_bags\": 15\n",
    "        }\n",
    "    }\n",
    "})\n",
    "\n",
    "runResults = rez.json()[\"status\"][\"firstRun\"][\"status\"][\"folds\"][0][\"resultsTest\"]\n",
    "print rez"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "Let's look at our results on the test set:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"3\" halign=\"left\">count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>predicted</th>\n",
       "      <th>audi</th>\n",
       "      <th>bmw</th>\n",
       "      <th>tesla</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>actual</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>audi</th>\n",
       "      <td>19</td>\n",
       "      <td>13</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>bmw</th>\n",
       "      <td>3</td>\n",
       "      <td>24</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>tesla</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>33</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          count          \n",
       "predicted  audi bmw tesla\n",
       "actual                   \n",
       "audi         19  13     4\n",
       "bmw           3  24     9\n",
       "tesla         3   4    33"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame(runResults[\"confusionMatrix\"])\\\n",
    "    .pivot_table(index=\"actual\", columns=\"predicted\", fill_value=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>accuracy</th>\n",
       "      <th>f1Score</th>\n",
       "      <th>precision</th>\n",
       "      <th>recall</th>\n",
       "      <th>support</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>audi</th>\n",
       "      <td>0.794643</td>\n",
       "      <td>0.622951</td>\n",
       "      <td>0.760000</td>\n",
       "      <td>0.527778</td>\n",
       "      <td>36</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>bmw</th>\n",
       "      <td>0.741071</td>\n",
       "      <td>0.623377</td>\n",
       "      <td>0.585366</td>\n",
       "      <td>0.666667</td>\n",
       "      <td>36</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>tesla</th>\n",
       "      <td>0.821429</td>\n",
       "      <td>0.767442</td>\n",
       "      <td>0.717391</td>\n",
       "      <td>0.825000</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       accuracy   f1Score  precision    recall  support\n",
       "audi   0.794643  0.622951   0.760000  0.527778       36\n",
       "bmw    0.741071  0.623377   0.585366  0.666667       36\n",
       "tesla  0.821429  0.767442   0.717391  0.825000       40"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame.from_dict(runResults[\"labelStatistics\"]).transpose()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Creating a real-time endpoint\n",
    "\n",
    "We create a function of type `sql.expression` that will represent our pipeline and that we call `brand_predictor`. It takes the URL to an image, passes it through the `inception` model to extract the features, and then into the `car_brand_cls_scorer_0` function that represents our trained model and that was created at the previous step."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<Response [201]>\n"
     ]
    }
   ],
   "source": [
    "print mldb.put(\"/v1/functions/brand_predictor\", {\n",
    "    \"type\": \"sql.expression\",\n",
    "    \"params\": {\n",
    "        \"expression\": \"\"\"\n",
    "            car_brand_cls_scorer_0(\n",
    "                {\n",
    "                    features: inception({url})\n",
    "                }) as *\n",
    "        \"\"\"\n",
    "    }\n",
    "})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "We can now call this endpoint on new images and get predictions back:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<strong>GET http://localhost/v1/functions/brand_predictor/application</strong><br /><strong style=\"color: green;\">200 OK</strong><br /> <div class=\"highlight\"><pre style=\"line-height: 125%\"><span></span>{\n",
       "  <span style=\"color: #333333; font-weight: bold\">&quot;output&quot;</span>: {\n",
       "    <span style=\"color: #333333; font-weight: bold\">&quot;scores&quot;</span>: [\n",
       "      [\n",
       "        <span style=\"color: #0000dd\">&quot;\\&quot;audi\\&quot;&quot;</span>, \n",
       "        [\n",
       "          <span style=\"color: #0000dd\">-8.133334159851074</span>, \n",
       "          <span style=\"color: #0000dd\">&quot;2016-05-05T04:18:03Z&quot;</span>\n",
       "        ]\n",
       "      ], \n",
       "      [\n",
       "        <span style=\"color: #0000dd\">&quot;\\&quot;bmw\\&quot;&quot;</span>, \n",
       "        [\n",
       "          <span style=\"color: #0000dd\">-7.200000286102295</span>, \n",
       "          <span style=\"color: #0000dd\">&quot;2016-05-05T04:18:03Z&quot;</span>\n",
       "        ]\n",
       "      ], \n",
       "      [\n",
       "        <span style=\"color: #0000dd\">&quot;\\&quot;tesla\\&quot;&quot;</span>, \n",
       "        [\n",
       "          <span style=\"color: #0000dd\">1.0666667222976685</span>, \n",
       "          <span style=\"color: #0000dd\">&quot;2016-05-05T04:18:03Z&quot;</span>\n",
       "        ]\n",
       "      ]\n",
       "    ]\n",
       "  }\n",
       "}\n",
       "</pre></div>\n"
      ],
      "text/plain": [
       "<Response [200]>"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# good tesla: http://www.automobile-propre.com/wp-content/uploads/2016/09/tesla-premiere-livraison-france-657x438.jpg\n",
    "# good tesla: http://insideevs.com/wp-content/uploads/2016/03/JL82776-750x500.jpg\n",
    "# good bmw: http://www.bmwhk.com/content/dam/bmw/common/all-models/1-series/5-door/2015/images-and-videos/bmw-1-series-wallpaper-1920x1200-03-R.jpg/jcr:content/renditions/cq5dam.resized.img.485.low.time1448014414633.jpg\n",
    "\n",
    "mldb.get(\"/v1/functions/brand_predictor/application\", \n",
    "         data={'input': \n",
    "               {'url': 'http://insideevs.com/wp-content/uploads/2016/03/JL82776-750x500.jpg'}})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Where to next?\n",
    "\n",
    "You can now look at the full [Transfer Learning with Tensorflow](../../../../doc/nblink.html#_demos/Transfer Learning with Tensorflow) demo, or check out the other [Tutorials and Demos](../../../../doc/#builtin/Demos.md.html)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
